Search CORE

11 research outputs found

Selection itérative de transformations pour la classification d'images

Author: Harchaoui Zaid
Paulin Mattis
Perronnin Florent
Revaud Jérôme
Schmid Cordelia
Publication venue: HAL CCSD
Publication date: 30/06/2014
Field of study

National audienceEn classification d'images, une stratégie efficace pour apprendre un classifieur invariant à certaines transformations consiste à augmenter l'échantillon d'apprentissage par le même ensemble d'exemples mais auxquels les transformations ont été appliquées. Néanmoins, lorsque l'ensemble des transformations possibles est grand, il peut s'avérer difficile de sélectionner un petit nombre de transformations pertinentes parmi elles tout en conservant une taille d'échantillon d'apprentissage raisonnable. optimal. En effet, toutes les transformations n'apportent pas le même impact sur la performance ; certains peuvent même dégrader la performance. Nous proposons un algorithme de sélection automatique de transformations : à chaque itération, la transformation qui donne le plus grand gain en performance est sélectionnée. Nous évaluons notre approche sur les images de la compétition ImageNet 2010 et améliorons la performance en top-5 accuracy de 70.1% à 74.9%

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL-Rennes 1

Selection itérative de transformations pour la classification d'images

Author: Harchaoui Zaid
Paulin Mattis
Perronnin Florent
Revaud Jérôme
Schmid Cordelia
Publication venue: HAL CCSD
Publication date: 30/06/2014
Field of study

Hal - Université Grenoble Alpes

Convolutional Patch Representations for Image Retrieval: an Unsupervised Approach

Author: Douze Matthijs
Harchaoui Zaid
Mairal Julien
Paulin Mattis
Perronnin Florent
Schmid Cordelia
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2016
Field of study

International audienceConvolutional neural networks (CNNs) have recently received a lot of attention due to their ability to model local stationary structures in natural images in a multi-scale fashion, when learning all model parameters with supervision. While excellent performance was achieved for image classification when large amounts of labeled visual data are available, their success for un-supervised tasks such as image retrieval has been moderate so far. Our paper focuses on this latter setting and explores several methods for learning patch descriptors without supervision with application to matching and instance-level retrieval. To that effect, we propose a new family of convolutional descriptors for patch representation , based on the recently introduced convolutional kernel networks. We show that our descriptor, named Patch-CKN, performs better than SIFT as well as other convolutional networks learned by artificially introducing supervision and is significantly faster to train. To demonstrate its effectiveness, we perform an extensive evaluation on standard benchmarks for patch and image retrieval where we obtain state-of-the-art results. We also introduce a new dataset called RomePatches, which allows to simultaneously study descriptor performance for patch and image retrieval

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL-Rennes 1

Transformation Pursuit for Image Classification

Author: Harchaoui Zaid
Paulin Mattis
Perronnin Florent
Revaud Jérôme
Schmid Cordelia
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 23/06/2014
Field of study

International audienceA simple approach to learning invariances in image classification consists in augmenting the training set with transformed versions of the original images. However, given a large set of possible transformations, selecting a compact subset is challenging. Indeed, all transformations are not equally informative and adding uninformative transformations increases training time with no gain in accuracy. We propose a principled algorithm – Image Transformation Pursuit (ITP) – for the automatic selection of a compact set of transformations. ITP works in a greedy fashion, by selecting at each iteration the one that yields the highest accuracy gain. ITP also allows to efficiently explore complex transformations, that combine basic transformations. We report results on two public benchmarks: the CUB dataset of bird images and the ImageNet 2010 challenge. Using Fisher Vector representations, we achieve an improvement from 28.2% to 45.2% in top-1 accuracy on CUB, and an improvement from 70.1% to 74.9% in top-5 accuracy on ImageNet. We also show significant improvements for deep convnet features: from 47.3% to 55.4% on CUB and from 77.9% to 81.4% on ImageNet

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Large-scale image classification with trace-norm regularization

Author: Douze Matthijs
Dudik Miro
Harchaoui Zaid
Malick Jérôme
Paulin Mattis
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

International audienceWith the advent of larger image classification datasets such as ImageNet, designing scalable and efficient multi-class classification algorithms is now an important challenge. We introduce a new scalable learning algorithm for large-scale multi-class image classification, based on the multinomial logistic loss and the trace-norm regularization penalty. Reframing the challenging non-smooth optimization problem into a surrogate infinite-dimensional optimization problem with a regular l1 -regularization penalty, we propose a simple and provably efficient accelerated coordinate descent algorithm. Furthermore, we show how to perform efficient matrix computations in the compressed domain for quantized dense visual features, scaling up to 100,000s examples, 1,000s-dimensional features, and 100s of categories. Promising experimental results on the "Fungus", "Ungulate", and "Vehicles" subsets of ImageNet are presented, where we show that our approach performs significantly better than state-of-the-art approaches for Fisher vectors with 16 Gaussians

CiteSeerX

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

The INRIA-LIM-VocR and AXES submissions to Trecvid 2014 Multimedia Event Detection

Author: Alahari Karteek
Chesneau Nicolas
Douze Matthijs
Gauvain Jean-Luc
Harchaoui Zaid
Lamel Lori
Leray Clément
Oneata Dan
Paulin Mattis
Potapov Danila
Schmid Cordelia
Schmidt Christoph Andreas
Verbeek Jakob
Publication venue: HAL CCSD
Publication date: 01/01/2014
Field of study

-This paper describes our participation to the 2014 edition of the TrecVid Multimedia Event Detection task. Our system is based on a collection of local visual and audio descriptors, which are aggregated to global descriptors, one for each type of low-level descriptor, using Fisher vectors. Besides these features, we use two features based on convolutional networks: one for the visual channel, and one for the audio channel. Additional high-level featuresare extracted using ASR and OCR features. Finally, we used mid-level attribute features based on object and action detectors trained on external datasets. Our two submissions (INRIA-LIM-VocR and AXES) are identical interms of all the components, except for the ASR system that is used. We present an overview of the features andthe classification techniques, and experimentally evaluate our system on TrecVid MED 2011 data

CiteSeerX

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

De l'apprentissage de représentations visuelles robustes aux invariances pour la classification et la recherche d'images

Author: Paulin Mattis
Publication venue: HAL CCSD
Publication date: 06/02/2017
Field of study

This dissertation focuses on designing image recognition systems which are robust to geometric variability. Image understanding is a difficult problem, as images are two-dimensional projections of 3D objects, and representations that must fall into the same category, for instance objects of the same class in classification can display significant differences. Our goal is to make systems robust to the right amount of deformations, this amount being automatically determined from data. Our contributions are twofolds. We show how to use virtual examples to enforce robustness in image classification systems and we propose a framework to learn robust low-level descriptors for image retrieval. We first focus on virtual examples, as transformation of real ones. One image generates a set of descriptors –one for each transformation– and we show that data augmentation, ie considering them all as iid samples, is the best performing method to use them, provided a voting stage with the transformed descriptors is conducted at test time. Because transformations have various levels of information, can be redundant, and can even be harmful to performance, we propose a new algorithm able to select a set of transformations, while maximizing classification accuracy. We show that a small amount of transformations is enough to considerably improve performance for this task. We also show how virtual examples can replace real ones for a reduced annotation cost. We report good performance on standard fine-grained classification datasets. In a second part, we aim at improving the local region descriptors used in image retrieval and in particular to propose an alternative to the popular SIFT descriptor. We propose new convolutional descriptors, called patch-CKN, which are learned without supervision. We introduce a linked patch- and image-retrieval dataset based on structure from motion of web-crawled images, and design a method to accurately test the performance of local descriptors at patch and image levels. Our approach outperforms both SIFT and all tested approaches with convolutional architectures on our patch and image benchmarks, as well as several styate-of-theart datasets.Ce mémoire de thèse porte sur l’élaboration de systèmes de reconnaissance d’image qui sont robustes à la variabilité géométrique. La compréhension d’une image est un problème difficile, de par le fait qu’elles sont des projections en deux dimensions d’objets 3D. Par ailleurs, des représentations qui doivent appartenir à la même catégorie, par exemple des objets de la même classe en classification, peuvent être visuellement très différentes. Notre but est de rendre ces systèmes robustes à la juste quantité de déformations, celle-ci étant automatiquement déterminée à partir des données. Nos deux contributions sont les suivantes. Nous montrons tout d’abord comment utiliser des exemples virtuels pour rendre les systèmes de classification d’images robustes et nous proposons ensuite une méthodologie pour apprendre des descripteurs de bas niveau robustes, pour la recherche d’image.Nous étudions tout d’abord les exemples virtuels, en tant que transformations de vrais exemples. En représentant une image en tant que sac de descripteurs transformés, nous montrons que l’augmentation de données, c’est-à-dire le fait de les considérer comme de nouveaux exemples iid, est la meilleure manière de les utiliser, pourvu qu’une étape de vote avec les descripteurs transformés soit opérée lors du test. Du fait que les transformations apportent différents niveaux d’information, peuvent être redondants, voire nuire à la performance, nous pro-posons un nouvel algorithme capable de sélectionner un petit nombre d’entre elles,en maximisant la justesse de classification. Nous montrons par ailleurs comment remplacer de vrais exemples par des virtuels, pour alléger les couts d’annotation.Nous rapportons de bons résultats sur des bancs d’essai de classification.Notre seconde contribution vise à améliorer les descripteurs de régions locales utilisés en recherche d’image, et en particulier nous proposons une alternative au populaire descripteur SIFT. Nous proposons un nouveau descripteur, appelé patch-CKN, appris sans supervision. Nous introduisons un nouvel ensemble de données liant les images et les imagettes, construit à partir de reconstruction3D automatique d’images récupérées sur Internet. Nous définissons une méthode pour tester précisément la performance des descripteurs locaux au niveau de l’imagette et de l’image. Notre approche dépasse SIFT et les autres approches à base d’architectures convolutionnelles sur notre banc d’essai, et d’autres couramment utilisés dans la littérature

Thèses en Ligne

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Of Learning Visual Representations Robust to Invariances for Image Classification and Retrieval

Author: Paulin Mattis
Publication venue
Publication date: 06/02/2017
Field of study

Ce mémoire de thèse porte sur l’élaboration de systèmes de reconnaissance d’image qui sont robustes à la variabilité géométrique. La compréhension d’une image est un problème difficile, de par le fait qu’elles sont des projections en deux dimensions d’objets 3D. Par ailleurs, des représentations qui doivent appartenir à la même catégorie, par exemple des objets de la même classe en classification, peuvent être visuellement très différentes. Notre but est de rendre ces systèmes robustes à la juste quantité de déformations, celle-ci étant automatiquement déterminée à partir des données. Nos deux contributions sont les suivantes. Nous montrons tout d’abord comment utiliser des exemples virtuels pour rendre les systèmes de classification d’images robustes et nous proposons ensuite une méthodologie pour apprendre des descripteurs de bas niveau robustes, pour la recherche d’image.Nous étudions tout d’abord les exemples virtuels, en tant que transformations de vrais exemples. En représentant une image en tant que sac de descripteurs transformés, nous montrons que l’augmentation de données, c’est-à-dire le fait de les considérer comme de nouveaux exemples iid, est la meilleure manière de les utiliser, pourvu qu’une étape de vote avec les descripteurs transformés soit opérée lors du test. Du fait que les transformations apportent différents niveaux d’information, peuvent être redondants, voire nuire à la performance, nous pro-posons un nouvel algorithme capable de sélectionner un petit nombre d’entre elles,en maximisant la justesse de classification. Nous montrons par ailleurs comment remplacer de vrais exemples par des virtuels, pour alléger les couts d’annotation.Nous rapportons de bons résultats sur des bancs d’essai de classification.Notre seconde contribution vise à améliorer les descripteurs de régions locales utilisés en recherche d’image, et en particulier nous proposons une alternative au populaire descripteur SIFT. Nous proposons un nouveau descripteur, appelé patch-CKN, appris sans supervision. Nous introduisons un nouvel ensemble de données liant les images et les imagettes, construit à partir de reconstruction3D automatique d’images récupérées sur Internet. Nous définissons une méthode pour tester précisément la performance des descripteurs locaux au niveau de l’imagette et de l’image. Notre approche dépasse SIFT et les autres approches à base d’architectures convolutionnelles sur notre banc d’essai, et d’autres couramment utilisés dans la littérature.This dissertation focuses on designing image recognition systems which are robust to geometric variability. Image understanding is a difficult problem, as images are two-dimensional projections of 3D objects, and representations that must fall into the same category, for instance objects of the same class in classification can display significant differences. Our goal is to make systems robust to the right amount of deformations, this amount being automatically determined from data. Our contributions are twofolds. We show how to use virtual examples to enforce robustness in image classification systems and we propose a framework to learn robust low-level descriptors for image retrieval. We first focus on virtual examples, as transformation of real ones. One image generates a set of descriptors –one for each transformation– and we show that data augmentation, ie considering them all as iid samples, is the best performing method to use them, provided a voting stage with the transformed descriptors is conducted at test time. Because transformations have various levels of information, can be redundant, and can even be harmful to performance, we propose a new algorithm able to select a set of transformations, while maximizing classification accuracy. We show that a small amount of transformations is enough to considerably improve performance for this task. We also show how virtual examples can replace real ones for a reduced annotation cost. We report good performance on standard fine-grained classification datasets. In a second part, we aim at improving the local region descriptors used in image retrieval and in particular to propose an alternative to the popular SIFT descriptor. We propose new convolutional descriptors, called patch-CKN, which are learned without supervision. We introduce a linked patch- and image-retrieval dataset based on structure from motion of web-crawled images, and design a method to accurately test the performance of local descriptors at patch and image levels. Our approach outperforms both SIFT and all tested approaches with convolutional architectures on our patch and image benchmarks, as well as several styate-of-theart datasets

Theses.fr

Bayes Risk for Large Scale Hierarchical Top-K Image Classification

Author: Paulin Mattis
Publication venue: Eidgenössische Technische Hochschule Zürich
Publication date: 01/01/2013
Field of study

Repository for Publications and Research Data

Local Convolutional Features with Unsupervised Training for Image Retrieval

Author: Douze Matthijs
Harchaoui Zaid
Mairal Julien
Paulin Mattis
Perronnin Florent
Schmid Cordelia
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/12/2015
Field of study

International audiencePatch-level descriptors underlie several important computer vision tasks, such as stereo-matching or content-based image retrieval. We introduce a deep convolutional architecture that yields patch-level descriptors, as an alternative to the popular SIFT descriptor for image retrieval. The proposed family of descriptors, called Patch-CKN, adapt the recently introduced Convolutional Kernel Network (CKN), an unsupervised framework to learn convolutional architectures. We present a comparison framework to benchmark current deep convolutional approaches along with Patch-CKN for both patch and image retrieval, including our novel ``RomePatches'' dataset. Patch-CKN descriptors yield competitive results compared to supervised CNNs alternatives on patch and image retrieval

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL-Rennes 1